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(54) System and metiiods for automatic call and data transfer processing 



(57) A prograrifimable automatic can and data trans- 
fer processing system which automatically processes 
incoming telephone calls, facsimiles and e-mails based 
on the identity of the caller or author, the subject matter 
of the message or request, and^or the time of day, which 
Includes: a central server for automatically answering 
an incoming call and collecting voice data of a caller; a 
speaker recognition module connected to the server for 
identifying the caller or author; a switching module 
responsive to the speaker recognition module for 
processing the call or message in accordance with a 
pre-programmed procedure based on the identification 
of the caller or auttior; and a programming interface for 
programnting the server, speaker recognizer module 
and the switching modula The system is programmed 
by the user to so as to process incoming t^ephone calls 
or e-maO and facsimile messages based on the identity 
of the caller or author, subject matter and content of the 
message and the time of day Such processing 
includes, but « not lintited to, switching the call to 
another system, forwarding ttie call to another tele- 
phone ternrnnal, placing the call on hold, or disconnect- 
ing the can. In another aspect of tiie present invention, 
the system may be employed to process information 
retrieved from ether telecontmunication deuces such as 
voice mail, facsimile/modem or e-mail. The system is 
capable of tagging tiie identity of a caller or participants 
to a teleconference, and transcribing the teleconfer- 
ences, phone conversations and messages of such call- 
ers and partic^>ants. The system can automatically 



index or prioritize the received calls, messages, e-mails 

and facsimiles according to the caller identification or 
subject matter of the conversation or message, and 
allow tiie user to retrieve messages that either origi- 
nated from a specific source or caller or retrieve calls 
which deal witti similar or specific subject matter. 
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Description 

[0001 ] The present invention relates to a system and 
methcxls for providing automatic call and data transfer 
processing and, more particularly, to a system and s 
methods for providing automatic call and data trar^r 
processing according to a prei)rogrammed procedure 
based on the identity of a caller or author, the subject 
matter and content of a call or message sndJor tfie time 
of dayofsuchcallormessaga io 
[0002] Generally, in the past, call processing has been 
manually performed either by a business owner, a sec- 
retary or a local central phone service. There are cer- 
tain conventional devices which partially perform some 
call processing functions. For example, conventional is 
answering machines and voice-mail services record 
incoming telephone messages which are then played 
back by tiie user of such devices or services. In addi- 
tion, desktop-telephone software or focal PBXs (private 
branch exchange) provide telephone networic switching 20 
capabilities. These conventional answering machines, 
voice-mail services and switching systems, however, 
are not capable of automatically performing distinct 
processing procedures tiiat are responsive to the iden- 
tity of the caller or evaluating the content or subject mat- 25 
ter of tiie call or message and then handling such call or 
message accordingly. Instead, tiie user n%ist first 
answer his or her telephone calls manually, or retrieve 
such calls from an answering machine or voice-maH, 
and ttien decide how to proceed on a call-by-call basis. 30 
The present invention eliminates or mitigates such bur- 
densome manual processing. 
[0003] Moreover, although protected by Dual Tone 
Multi-Frequency (DTMF) keying, answering machines 
and voice-mall services are unable to identify or verify 3s 
ttie caller when bemg remotely accessed or re-pro- 
grammed by a caller witti a valid personal identification 
number (PIN) which is inputted by DTMF keys. Further, 
conventional teleconference centers also rely on DTMF 
PINs for accessibilfty but are unable to verify and tag tiie 40 
klentity of tiie speaker during a teleconference. Such 
answering machines, voice-mail and teleconference 
centers may tiierefore be breached by unauttiorized 
persons with access to an ottienwise valid PIN. 
[0004] It is therefore an object of the present invention 4S 
to provide a system and metiiods for automatic call and 
data transfer processing In accordance witti a pre-deter- 
mined rranner based on the identity of the caller or 
autiior, the subject matter of the call or message arxJ/or 
the time of day. so 
[0005] It is anottier object of ttie present invention to 
provkfe a call processing system which can first tran- 
scribe messages received by telephone, facsirrile and 
e-mail, as well as ottio- data electronically received by 
ttie system, ttien tag ttie identity of the caller (or partid- ss 
pants to a teleconference) or tiie auttior of such e-mail 
or facsimile messages, and then index such calls, con- 
versations and messages according to tiieir origin and 



subject matter, whereby an auttiorized iser can ttien 
access the system, either locally or remotely, to play- 
back such telephone conversations or messages or 
retrieve such ennail or facsimile messages in ttie form 
of synthesized speech. 

[0006] It is yet another object of the present invention 
to provide a system ttiat Is responsive O-e., accessible 
and programmable) to vofoe activated commands by an 
auttiorized user, wherein the system can identify and 
verify the i^er before allowing ttie us^ to access calls 
or messages or program tile system. 
[0OO7] In one aspect of the present inventioa a pro- 
grammable automatic call and message processing 
system comprises: server means for receiving an 
incoming call; speaker recognition means, operatively 
coupled to the server means, for identifying the caller; 
speech recognition means, operatively coupled to ttie 
server mearis, for determining subject matter and con- 
tent of the call; switching means, responsive to ttie 
speaker reoognitibn means and speech recognition 
means, for processing tiie call in accordance with ttie 
identity of ttie caller and/or ttie subject matter of ttie call; 
and programming means, operatively coupled to the 
server means, speaker recognition means, speech rec- 
ognition means arxf ttie switching means for program- 
ming the system to peribrm the processing. 
[0008] The system is preferably programmed by ttie 
user so as to process incoming telephone calls in a pre- 
determined manner based on ttie dentity of ttie caller. 
Such processing Includes, but is not limited ta switching 
ttie call to another system, forwarding the call to anottier 
telecommunication terminal, directing the call to an 
answering machine to t>e recorded, placing the call on 
hold, or disconnecting ttie call. 
[P009] In anottier aspect of ttie present invention, the 
system may be pre-programmed to process ah incom- 
ing telephone call, facsimile or e-mail message accord- 
ing to tiieir content, subject matter, or according to ttie 
time of the day they are received. Still further, ttie sys- 
tem may preferably be programmed to process an 
incoming telephone call, facsintile or e-mail message 
according to a combination of such factors, i.e., ttie 
identity of ttie caller, ttie sut)ject matter and content of 
tiie call and the time of day. In addition, e-mail mes- 
sages (and ottier messages created by'appDcation spe- 
cific software such as LOTUS NOTES) may be 
processed in accordance with mood stanps, i.&, infor- 
mational fields provided by certain mailing programs 
such as LOTUS NOTES which allow ttie sender to indi- 
cate ttie nature of ttie message such as ttie confidenti- 
ality or urgency of ttie messaga For future e-mail or 
data exchange technkiiues. such infbnmation can be 
Included in a heada- of the e-mail or fac8imO& Further, 
ttie system may be programmed to prompt ttie caller to 
explidtiy advise the system of the nature of the mes- 
saga Still furttier. the system may be corrfigured to 
retrieve and process data from ottier telecommunication 
devices such as voice mafl systems or answering 
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machines. 

[001 0] in still a further aspect of the present invention, 
the caii processing system of the present Invention is 
capable of tagging the identity of a caller or the partici- 
pants to a teleconference, while transcrit>ing the mes- 
sage or conversations of such callers and participants. 
Consequently, the system can automatically manage 
teleptone messages and conversations, as well as 
voice mail, e-mail and fEicsimile messages, storing 
such calls and messages according to their sutq'ect mat- 
ter or the identity of the caller or author, or botti. Specif- 
ically, the present invention can. in combination with 
such identification smd transcription, automatically index 
or prioritize the received telephone calls and e-mail and 
facsimile messages according to their origin and/or sut>- 
ject matter which allows an authorized user to retrieve 
specific messages, ag., tiiose messages that origi- 
nated from a specific source or those which deal with 
similar or specif ic sut^ect matter. 
[001 1] In another aspect of the present invention, the 
system includes text-to- speech capabilities which 
allows the system to prompt (i.e.. query) the user or 
caller in ttie form of synthesized speech, to provide 
answers to questions or requests by the user or caller in 
synthesized speech and to playback e-mail and facsim- 
ile messages in synthesized speech. The system also 
includes playback capabilities so as to playback 
recorded telephone messages and other recorded 
audio data. 

[0012] These and other objects, features and advan- 
tages of the present invention wiIl . t>ecome apparent 
from the Ibllowing detailed description of illustrative 
embodiments ttiereof. which is to be read in connection 
with the accompanying drawings. 

Rg. 1 is a bkxk diagram illustratinig general func- 
tions of an automatic call and data transfer process- 
ing system in accordance witii the present 

invention; 

Fig. 2 is a iAock diagram, as well as a flow diagram, 
illustrating the functional interconnection between 
modules for a call and data transfer processing sys- 
tem in accordance with an embedment of the 
present invention: and 

Rgs. 3a and 3b are flow diagrams illustrating a 
method for call or data transfer processing in 
accordance with the present invention. 

[001 3] Refem'ng to Rg. 1 . a block diagram illusbrating 
general functions of an automatic call and data transfer 
processing system of tiie present invention is shown. 
The present invention is an automatic call and data 
transfer processing machine tfiat can t>e programmed 
by an authorized user (block 12) to process incoming 
telephone calls in a manner pre-determined by such 
user Although the present invention may be employed 
to process any voice data that rray be received tivough 
digital or analog channels, as well as data received 



electronically and otiierwise convertible into readable 
text (to be further explained below), one embodiment of 
the present invention involves the processing of tele- 
phone communications. Particulariy. the system 10 wOl 

5 automatically answer an incoming telephone call from a 
caller (block 14) and, depending upon the manner in 
which the system 10 is programmed by tiie user (block 
12). the system 10 may process the telephone caD by, 
for example, switching the call to another telecommuni- 

10 cation system or to an answering machine (Block 18), or 
by handling the call directiy, e.g.. by oxinecting. discon- 
necting or placing the caller on hold (Block 16). In addi- 
tion, the system 10 may be programmed to route an 
incoming telephone call to various telecommunication 

75 systems in a specific order (ag., directing the call to 
several pre-determined telephone numbers until such 
call is answered) or simultaneously to all such systems, 
ft is to be urtierstood that the telecommunication sys- 
tems listed in block 18, as well as the options shown in 

20 block 16 of Rg. 1, are merely Illustrative, and not 
exhai^tive, of the processing procedures that the sys- 
tem 10 may be programmed to perform. . 
[0014] In another embodiment of the present inven- 
tion, the system 10 may be programmed to process 

25 incoming facsimile and e-mail messages, or automati- 
cally retrieve messages from e-mail or voice mail sys- 
tems. Thus, it is to be understood that the bidirectional 
lines of Rg. 1 connecting the system 10 to the telecom- 
munication systems in block 18 (e.g., e-mail, voice mail, 

30 facsimile/hTOdem and answering machine) indicates 
that the system 1 0 is designed to send data (ag.. calls 
or messages) to such systems, as well as retrieve and 
process data stored or recorded in such systems. For 
instance, the system 1 0 may be programmed to process 

35 a particular call by directing the call to an answering 
machine (block 18) to be recorded. The systan 10 may 
. subsequently retrieve the recorded message from the 
answering machine, which is ttien decoded and proc- 
essed by the system 10 in a particular nutfiner. Further, 

40 the system 10 can be programmed to transform ah 
incoming telephone call or messages into a page which 
can then be transrrntted to the user's pager, cellular 
phone or e-mail. 

[0015] The functional modules of the system 10 and 
45 meir specif ic im&«:tk>n In accordance vintii an eniM 
ment of the present invention will be explained below by 
reference to Rg. 2. It is to be understood that same or 
similar components Illustrated tivoughout the figures 
are designated with the same reference numeral. It is to 
so be furtiier understood that tiie functional modules 
desaibed herein in accordance with the present Inven- 
tion may be Implemented in hardware, software, or a 
combination thereof. Preferably, tiie main speech and 
speaker recognition, language identification modules 
55 and indexing modules of present invention, for example, 
are implemented in software on one or more appropri- 
ately programmed general purpose digital computer or 
computers, each having a processor, associated mem* 
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ory and input^output interfaces for executing the ele- 
ments of the present invention. It should be understood 
that while the invention Is preferaK>fy implemented on a 
suitably programmed general purpose computer or 
computers* the functional elements of Fig. 2 may be 
considered to include a suitable and preferred proces- 
sor architecture for practicing the invention and are 
exemplary of functional elements which m£^ be imple- 
mented within such computer or conriputers through 
programming. Further, the functional elements of Fig. 2 
may be implemented by programming one or more gen- 
eral purpose microprocessors. Of course, special pur- 
pose microprocessors may be employed to implement 
the invention. Given the teachings of the invention pro- 
vided herein, one of ordinary skill in the related art will 
be able to contemplate these and similar inplementa- 
tions of the elements of the invention. 
[0016] Referring now to Fig. 2. the system 10 includes 
a server 20 preferably connected to various telecommu- 
nication systems including, but not limited to, one or 
more telephone lines (block 14) and one or more fac- 
simile and a modem lines (Rgs. 1 and 2, block 18) for 
receiving and sending telephone calls and message 
data, re^ectively. TTie server 20 is programmed to 
automatidally answer incoming telephone calls and 
receive incoming facsimile transmissions. The system 
10 may also include a permanent internet/intranet con- 
nection for accessing a local network mail server, 
wheretjy the server 20 can be programmed to periodi- 
cally connect to such local networic mail server (via 
TCP/IP) to receive and process incoming e-mails, as 
wen as send e-mail messages. Alternatively, if the sys- 
tem 1 0 is not permanentiy connected to a k)cal network 
sender, the system sender 20 may be programmed to 
periodically dial an axess numt)er to an internet pro- 
vider to retrieve or send e-mail messages. Such proce- 
dures may also be performed at the option of the user 
(as opposed to automatically monitoring such e-mail 
accounts) when the user accesseis the system 10. 
[0017] Further, as shown in Rgs. 1 and 2 (block 18), 
the server 20 may be directiy connected to voice mail 
systems and answering machines so as to allow the 
user to retrieve and process messages that have been 
recorded on such voice-mail and answering machine 
systems. If the system 10 is connected to a tocal net- 
woric system, the server 20 may be programmed to peri- 
odically retrieve messages from other voice mail 
systems or answering machines which are not directiy 
connected to tiie server 20, but otiierwnse accessible 
through the kx:al networK so that the system 10 can 
then automatically monitor and retrieve messages from 
such voice maO systems or answering machines. 
[0018] The server 20 includes a recorder 40 for 
recording and storing audio data (e.g., incoming tele- 
phone calls or messages retrieved from voice maD or 
answering machines), preferably in digital form. Further- 
more, the server 20 preferably includes a compres- 
sionAdecompression nrxxlule 42 for compressing the 



digitized audio data, as well as message data received 
via e-mail and facsimile, so as to inaease the data stor- 
age capability of a memory (not shown) of tiie system 
1 0 and for decompressing such data before reconstruc- 

5 tion when such data is retrieved from memory. 

[001 9] A speaker recognizer module 22 and an auto- 
matic speech recognizer/natural language understand- 
ing (ASR/NLU) nrKxfaile 24 are operatively coupled to 
tiie server 20. The speaker recognizer module 22 deter- 

10 ntines the identity of the caller 14 and partidpants to a 
conference call from the voice data received by the 
server 20, as well as the author of a received facsimile 
or e-maH messaga The ASR/NLU nmlule 24 converts 
voice data and ottier message data recaved from the 

IS server 20 into readable text to determine the content 
and subject matter of such calls, conversations or mes- 
sages. In addition, as further demonstrated below, the 
ASR/NLU nrKXlule 24 processes vert)al commands from 
an autiiorized user to remotely program the system 1 0. 

20 as well as to generate or retrieve messages. The 
ASR/NLU module 24 also processes voice data from 
callers and autiiorized users to perform interactive voice 
response (IVR) functions. A language identifier^nsla- 
tor module 26. operatively connected to the ASR/NLU 

25 module 24. is provided so tiiat the system 1 0 can under- 
stand and property respond to messages in foreign lan- 
guage when the system is used, for example, in a multi- 
language country such as Canada. 
[0020] A switching module 28, operatively coupled to 

so the speaker recognizer module 22 and the ASR/NLU 
module 24, processes data received by tiie speaker rec- 
ognizer module 22 and/or tiie ASR/NLU module 24. The 
switching module perfbnms a processing procedure witii. 
. respect to incoming telephone calls or facsimile or e- 

35 mail nrtessages (e.g., directing a call to voice-mail or 
answering machine) in accordance witii a p>re-pro- 
grammed procedure. 

[0021] An identification (ID) tagger module 30. opera- 
tively connected to tiie speaker recognizer module 22, 

40 is provided for electronically tag^ng the identity of the 
caller to tiie caller's message or conversation or tagging 
the identity of the author of an e-rml or facsimile mes- 
sage. Further, when operating in the background of a 
teleconference, the ID tagger 30 will tag the identity of 

45 the person currentiy speaking. A transcribernfKXiule 32, 
operatively connected to ttie ASR/NLU module 24. is 
provided for transcr&>ing the telephone message or con- 
versation, teleconference and/or facsimile message. In 
addition, the transcrS>er module 32 can transcribe a ver- 

50 bal message dictated by the user, which can subse- 
quentiy be sent by the system 10 to anottier person via 
telephone, facsimile or e-mail. 
[0022] An audio indexer/prioritizer nxxiule 34 is oper- 
atively connected to tiie ID tagger module 30 and the 

55 transcriber module 32. The audk) indexer4)rioritizer 
module 34 stores the transcriptkm data and caller Men- 
tiftcation data which is prooessed by the transaiber 
module 32 and the ID tagger module 30. respectively. 
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as well as the time of the call, the originating phone 
number (via automatic number identification (ANI) if 
available) and e-mail address, In a preiDrogrammed 
manner, so as to allow the user to retrieve specific calls 
or messages from a particular party or those calls or 5 
messages which pertain to specific subject maXter. Fur- 
ther, the audio indexer4)rioritizer can be programmed to 
prioritize certain calls or messages and infbmi the user 
of such calls or messages. 

[0023] A speech synttiesizer module 36, operatively 10 
connected to the aucfio indexeri^oritizer module 34, 
allows the user to retrieve messages (e-mails or facsim- 
iles) in audio form (i.a. synthesized speech). The 
speech synthesizer is also operatively coupled to the 
ASR/NLU module for providing system prompts (i.e., is 
queries) in ttie fbmi of syrthesized speech (as opp<»ed 
to being displayed, for example, on a computer moni- 
tor). 

[0024] A programming interface 38, operatively cou- 
pled to the server 20, speaker recognizer module 22, 20 
language identifier/translator module 26, ASR/NLU 
module 24, audio indexer4>rioritizer nrKxiule 34 and the 
switching module 28. is provided for programming the 
system 10 to process calls and messages in accord- 
ance with a pre^letermined procedure As explained in 2s 
detail below, a user may program the systan 10 using 
the programming interface 38 ttirough eitiier voice com- 
mands or a GUI (graphical user interface), or both. In a 
preferred entoliment. the syistem 10 is programmed 
byvert)al commands from the user (I.e., voice conrvnand do 
mode). Specifically, ttie user may program the system 
i 0 with verbal commands otiner remotely, by calling into 
the system 10, or locally with a microphone. The pro- 
gramming interface 38 is connected to the server 20 
which, in conjunction with the speaker recognizer mod- as 
ule 22 and the ASR/NLU nrxxiule 24, verifies the identity 
of the user before processing the veit>al programming 
commands of tfie user. The system 10 may either dis- 
play (via the GUQ or play back (via the speech synthe- 
sizer 36) information relating to the verbal programming 40 
commands 0>e.. whether the system 10 recognizes 
such command), as well as the current programming 
structure of the system 10. 

[0025] In another embodiment, the system 10 nfiay be 
programmed locally, through a PC and GUI saeen or 4S 
programmed remotely, by accessing the system 10 
through a computer network from a remote tocation. 
Similar to conventional wirxfows interface, ttie user may 
program the system 10 by selecting certain fields which 
may be displayed on the GUI. It is to be appreciated tiiat so 
the system 10 may be progrannmed tiirough a combina- 
tbn of votee commands and a GUL In such a situation, 
the GUI may, for example, provide assistance to the 
us^ in giving the requisite voice commands to program 
the system 10. Still furtfier, the system 10 may be pro- 55 
grammed by editing a corresponding programming con- 
figuration lae which controls the functbnal modules of 
Rg. 2. 



[0026] The operation of ttie present invention will now 
be descrbed with reference to Rg. 2 and Rgs. 3a and 
3bL It is to be understood ttiat the deletion of the 
present invention in Rg. 2 could be considered a flow 
chart for illustrating operations of the present invention, 
as well as a block diagram showing an embodiment of 
the present invention. The server 20 is programmed to 
automatically answer an inconrting telephone call, e- 
mail. facsimile/hiodem, or ottier electronic voice or mes- 
sage data (step 100). The server 20 distinguishes 
between incoming telephone calls, ennail messages, 
facstmile messages, etc.. by special codes. i.e. proto- 
cols, at the beginning of each message which indicates 
the sourca Particularly, tiie server 20 initially assumes 
ttiat ttie tnconvng call is a telephone communication and 
will proceed accordingly (step 110) unless the server 20 
receives, for example, a modem handshake signal, 
whereby the system 10 wiN handle ttie call as a compu- 
ter connection protocol. It is to be understood ttiiat ttie 
system 10 may be programmed to monitor other voice 
mail or e-mail accounts by periodically calling and 
retrieving voice mail and e-mail messages from such 
accounts. 

[P027] If it is determined that ttie incoming call 
received by ttie server 20 is a telephone call, tiie audio 
data (e.g., incoming calls as wdl as calls retrieved from 
voice mail or answering machines) is recorded ttie 
recorder 40 (step 112). The recorder 40 may be any 
conventional device such as an anatog recorder or dig- 
ital audio tape ("DAr). Preferably, ttie recorder 40 is a 
digital recorder, i.e., an analog-to<ligital converter for 
converting ttie audio data into digital data. The digitized 
audio data may then be compressed by the compres- 
sionAdeoompression module 42 (step 1 14) before being 
stored (step 116) in memory (not shown in Rg. 2). It is 
to be appreciated that any conventional algorithm, such 
as those disclosed in "Digital Signal Processing. Syn- 
ttiesis and Recognition" by S. Furul. Dekker, 1989. may 
be employed ttie compression/decompression mod- 
ule 42 to process ttie message data. 
[0028] Next, simultaneously witti ttie recording and 
storing of ttie audk) data, ttie identity of ttie caller is 
determined by processing ttie caller's audio communi- 
cations and/or audio responses to queries by the sys- 
tem 10. Spectfically. ttie caller's verbal statements and 
responses are received by the server 20 and sent to 
speaker recognizer module 22, wherein such v&bsd 
statements and responses are processed ajnd com- 
pared witti previously stored speaker models (step 1 20). 
If the speater is identified by matching ttie received 
voice data with a previously stored voice model of such 
speaker (step 130). and rf ttie system 10 is pre-pro- 
grammed to process calls based on the identity of a 
caller, ttie system 10 will ttien process ttie telephone call 
In accordance witti such pre-programmed procedure 
(step 152). 

[0029] If, on ttie other hand, ttie speaker (e.g.. a first 
time caller) cannot be identffied via ttie previously 



5 



9 



EP0935378A2 



10 



stored voice models, speaker identrfication may be per- 
formed by both the speaker recognizer module 22 and 
the ASR/NLU module 26, whereby the content of the 
telephone message may be processed by the ASR/NLU 
module 26 to extract the caller's name which Is then 
compared with previously stored names to detemiine 
the identity of such caller (step 140). If the Identity of the 
caller Is then determined, the system 1 0 will process the 
telephone call In accordance with the identity of the 
caller (step 152). 

[0030] In the event that the syst&n 10 is unable to 
identify the caller from either the stored voice models or 
the content of the telephone message, the speaker rec- 

; ognizer module 22 sends a signal to the server 20 
which, in turn, prompts the caller to identify him or her- 
self witti a query, ag.. "Who are you." (step 1 50) and the 
above identrfication process is r^>eated (step 120). The 
server 20 obtains the query in syrithesized speech from 
speech synthesizer module 36 It is to be understood 
that, as stated above, the system 10 may be pro- 
grammed to inrtially prompt the caller to identify him or 
herself or ask details regarding the reason for the call. 
[0031 ] Once the caller or author has been identified by 
the speaker recogrtizer module 22, a signal is sent by 
the speaker recognizer module 22 to the switching mod- 
ule 28, whereby the switching module 28 processes the 
call or message based on the identity of the caller or 
author in accordance virith a pre-programmed procedure 
(step 152). If, on the other hand, the identity of the caller 
ultlmateiy cannot be kJentifled, the system 10 may be 
programmed to process the call based on an unknown 
caller (step 154) by. e.g., fonArarding the call to a voice 
mail, ^jch progranmng, to be further explained, is per- 
formed by the user 12 through the programming inter- 
fece module 38. As stated atxave, the processing 
options which the system 10 may be programmed to 
perfbmi include, but are not limited to, switching the call 
to another system, directing the call to anotfier t^ecom- 
munication terminal (Rgs. 1 and 2. block 18) or directly 

" handling the call by either connecting the call to a partic- 
ular party, disconnecting the call, or placing the call on 
hold (Figs. 1 and 2, block 16). 
[D032] It is to be appreciated that whenever a new 
caller Interacts witfi the system 10 for the first time, 
speaker models are txiilt and stored in the speaker rec- 
ognizer module 22, unless erased at the option of the 
user. Such models are then utilized by the speakeir rec- 
ognizer module 22 for identification arxl verification pur- 
poses when that caller interacts with the system 1 0 at a 
SLteequent time. 

[0033] It is to be appreciated that the system 10 may 
perfbrm speaker kJentificatton by utilizing methods other 
than acoustic features when the requisite voice nrxxfels 
do not exst For example, vnth regard to telephone 
calls, the system 10 may utilize additional InfomrBtion 
(ag. caller ID) to enhance the accuracy of the system 
10 and/br to kfentify first time callers. 
[D034] As further explained bek3w, the system 10 may 



be programmed to store the name and originating tele- 
phone numt)er of every caller (or specified callers). 
Such capability allows the user to automatically send 
reply messages to callers, as well as dynamically create 

5 an address book (which is stored in the system 10) 
which can be 8ufc)sequent)y accessed by the user to 
send a message to a particular person. 
[PD351 It Is to be understood thiat depending upon the 
applicatton, it is not necessary that the system 10 per- 

10 form speaker recognitk>n and natural language under- 
starKling in real time Q.e., simultaneously with the 
recording and during the time period erf the actual tele- 
phone call) in every instanc& For example, the system 
1 0 can be progranvned to query the caller (Sna IVR prp- 

is gramming) to obtain relevant infomnation (i.a. name 
and reason for call) at the inception of the can and store 
such information. The identification process may then 
be performed by the speaker recognizer module 22 or 
the ASR/NLU module 24 subsequent to the call by 

20 retrieving the stored audio data from menfK>ry (step 1 1 8) 
(as indicated by the dotted line in Fig. 3a) 
[P036] It is to t)e understood that any type of speaker 
reco^ition system may be utilized by the speaker rec- 
ognizer nmlule 22 for identifying the caller. Preferat)ly, 

2S the speaker recognition system employed in accord- 
ance with the present invention is the system which per- 
fonns text-independent speaker verification and asKs 
random questions, i.a, a contjination of speech recog- 
nition, text independent speaker recognition and natural 

30 language understanding as discfosed in US. Serial Ho. 
08/871,784, ffled on June 11, 1997, and entitled: "Appa- 
ratus And Methods For Speaker Verification / Identifica- 
tion / Classification Enploying Non-Acoustic And/Or 
Acoustic Models and Databases." the disclosure of 

35 which is incorporated herein by referenca More partic- 
ularly, the text-independent speaker verification system 
IS preferat)ly based on a frame-t>y frame feature classifi- 
cation as dsdosed in detail in U.S. Serial No. 
08^88.471 fOed on January 28. 1997 and entitled: "Text 

40 Independent Speaker Recognition for Transparent 
Command Ani)igurty Resolutfon And Continuous 
Access Control." the disclosure of which is also Incorpo- 
rated herein by reference. 

[0037] As explained in the above-incorporated refer- 
45 ence US. Serial No. 08/871,784. text-independent 
speaker recognition is preferred over text-dependant or 
text-prompted speaker recognition k)ecause text inde- 
p^ence allows the speaker recognition function to be 
carried out in parallel with other speech reoognition- 
50 based functions in a manner transparent to the caDer 
witttout requiring interruption for new commands or 
identification of a new caller whenever a new caller is 
encountered. 

[0038] Next, referring to Rg. 3b (and assunting the 
55 system 10 is programmed to process calls based on the 
identity of a caller a author), if it is determined that the 
incoming caD is a facsimile or e-mail message, the mes- 
sage data (e.g.. Incoming e-mails or nrtessages 
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retrieved from e-mail accounts) are processed by the 
ASR/NLU module 24 (step 190). compressed (step 
192). and stored (step 194) in memory (not shown). 
With regard to e-maii messages, the data Is directly 
processed (since such data is already in text format). 
With regard to facsimile messages, the ASR/NUU mod- 
ule 24 enrploys optical character recognition (OCR) 
using known techniques to convert the received facsim- 
ile message Into readatile text G a. transaik)e the tac- 
simile message Into an ASCII file). 
[0039] Next, simultaneously with the transcrSnng and 
storing of the incoming message data, the identity of the 
author of such message may be determined via the 
ASR/NLU module 24 whereby the content of the incom- 
ing message is analyzed (step 200) to extract the 
author*s name or the source of the message, which is 
then compared with previously stored names to deter- 
mine the identity of such author (step 210). If the author 
is identified (step 210). the message can be processed 
In accordance with a pre-programmed procedure based 
on the identity of the author (step 222). If. on the other 
hand, the Identity of the autha cannot be identified, the 
message may be processed in accordance with the pre- 
programmed procedure tor an unidentified author (step 
224). 

[0040] As stated above, it Is to be understood that it Is 
not necessary that the system 10 process the incoming 
or retrieved message in real time (i s., simultaneously 
with the tremscribing of the actual message) in every 
instance. Processing may be performed by the 
ASR/NLU nrxxiule 24 subsequent to receiving the e-mail 
or toimOe message data retrieving the transcribed 
message data from memory (step 196) (as Indicated by 
thedotled line in Rg. 3b). 

[0041 ] In addition to the identity of the caller or author, 
the system 10 may be further programnfed the user 
12 to process an incoming telephone can or facsimile or 
e-mail message based on the content and suksject mat- 
ter of the call or message and/or the time of day in which 
such call or message is receved. Referring again to 
Figs. 2, 3a and 3b. alter receiving an inconwig tele- 
phone can or e-mail or facsimile ntessage. or alter 
retrieving a recorded message from an answering 
machine or voice mail, the server 20 sends the call or 
message data to the ASR/NLU module 24. In the case 
of voice data (e.g. telephone calls or messages 
retrieved from voice man or answering machine), the 
ASR/NLU module 24 converts such data into syrnbolic 
language or readable text As stated above. e-nro1 mes- 
sages are directly processed (since they are in readable 
text format) and facsimile messages are converted into 
readable text (i a. ASCII fOes) via the ASR/NLU module 
26 using known optical character recognition (OCR) 
methods. TTie ASR/NLU module 26 then analyzes the 
call or message data by utilizing a combination of 
speech recognition to exfract certain keyword or topics 
and natural language understanding to determine the 
sut]ject matter and content of the call (step 160 In Fig. 



3a for telephone calte) or message (step 200 in Fig. 3b 
for e-mails and facslmiles). 

[0042] Once the ASR/NLU module deteniiines the 
subiecX matter of the can (step 170 in Fig. 3a) or the 

5 message (step 220 in Fig. 3b), a signal is then sent to 
the switching module 28 from the ASR/NLU module 24, 
wherein the call or message is processed in accordance 
with a pre-determined manner based on the subject 
matter and content of the call (st^ 1 58 in Fig. 3^ or the 

10 content of the message (step 228 in Fig. 3b). For 
instance, if a message or call relates to an emergency 
or accident, the switching module 28 may be pro- 
grammed to transfer the call immediately to a certain 
Individual. 

15 [0043] In the event that the ASR/NLU module 24 is 
unable to determine the subject matter or content of a 
telephone call, tiie ASR/NLU module 24 sends a signal 
to the speech syrrthesizer 36 which, in tum. sends a 
message to the server 20, to prompt the caller to articu- 

20 late In a few words the reason for the can (step 180). 
e.g., "What is the reason for your call?" Again, it is to be 
understood that tiie system 10 may be programmed to 
initially prompt the caller to state the reason for the call. 
If the system 10 is stin unable to determine the subject 

25 matter of such caH, the call may be processed In accord- 
ance with a pre-programmed procedure based on 
unknown matter (step 156) Likewise, if the subject mat- 
ter of an e-mail or facsimile message cannot be deter- 
mined (step 220), the message may be processed in 

30 accordance with a preixogrammed procedure based 
on unknown matter (step 226). 
[0044] Further, in the event that an incoming call or e- 
mail HfYessage is in a language foreign to the system 1 0 
(i.a. foreign to the user), ttie ASR/NLU nxxiule 26 win 

35 signal the language identifier/translator module 26 to 
Identify the particular language of tiie call or message, 
and then provide the required translation to the 
ASR/NLU module 26 so as to allow tiie system 10 to 
understand the call and answer tiie caller in the proper 

40 languag& It is to be understood that tiie system 10 may 
also be pre-programmed to process calls or messages 
with an untaiown language in a particular nenner. 
[0045] It is to be appreciated that any conventional 
technique for language kJentification and translation 

45 may be employed in tiie present invention, such as the 
well-known machine language kJentification technique 
disclosed in the article by Hieronymus J. and Kadambe 
S., "Robust Spoken Language kientif'ication using 
Large Vocabulary Speech Recognition." Proceedings of 

so ICASSP 97. Vol. 2 pp. 1 1 1 1, as weU as the language 
translation technique disclosed in Hutchins and Somers 
(1992): 'An Introduction to Machine Translation.' Aca- 
demic Press. London; (encyclopedic overview). 
[0046] In addition to the atxsve references, language 

55 identification can be performed i^g several statistical 
mettiods. Rrst, if tiie system 10 is configured to process 
a small number of different languages (ag.. in Canada 
wheire essentiaHy only English or Firench are spoken). 
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the Astern 10 may decode the input text in each of the 
different languages (using different ASR systen^). The 
several decoded scripts are then analyzed to find statte- 
ileal patterns (i e.. the statistical distribution of decoded 
words in each script is analyzed), ff the decoding was 
perfonned in the wrong language, the perplexity of the 
decoded script would be very high, and that particular 
language would be excluded from consideratioa 
[0047] Next, language identification may be performed 
on a phonetic level where the system recognizes a set 
of phonemes (either using a universal phonetic syst^ 
or several systems for different languages). The system 
then estimates the frequencies of the decoded pho- 
neme sequences for each languaga If a particular 
decoded sequence is unusual, the system would 
exclude such language from consideration. There may 
also be some sequences which are typtcai for a certain 
language. Using such lactorSi the system win Identify 
the most probable language. 
[0048] It is to appreciated that the present invention 
. may utilize ttie kientity of the caller to perform language 
identification. Sp^ically, if ttie speaker profile of a cer- 
tain caller (which is stored in the system 10) indicates 
that the caller speaks in a certain language, this infor- 
mation may be a fector in identifying the language. Con- 
versely, if ttie system 10 identifies a particular language 
using any of tiie above mettiods, ttie system 10 may 
then determine ttie identity of a caller by searching the 
speaker profiles to determine which speakers use such 
identified language. 

[0049] It is to be understood that both speech recog- 
nition and natural language understanding may be uti- 
lized by tiie ASR/NLU module 24 to process data 
received from tiie server 20. The present invention pref- 
erably employs the natural language understanding 
techniques disclosed in U.S. Serial No. 08/859.586, 
filed on May 20. 1997, and entitied: "A Statistical Trans- 
lation System with Features Based on Phrases or 
Groups of Words." and U.S. Serial Na 08/593.032. filed 
on January 29. 1996 and entitied "Statistical Natural 
Language Understanding Using Hidden aumpings." 
the disclosures of which are incorporated herein by ref- 
erence. TTie above-incorporated inventions concern 
natural language understanding techniques for parame- 
terizing (i.e. converting) text input (using certain algo- 
rittims) into language which can be understood and 
processed by ttie system 10. For example, in the con- 
text of the present invention, the ASR component of tiie 
ASR/NLU module 24 supplies ttie NLU component of 
such module witii unrestricted text input such as "Play 
ttie first message from Bob." Such text may be con- 
verted bf ttie NLU component of ttie ASR/NLU module 
24 into "retrieve-message(sendersBob. message- 
number=1)." Such parameterized action can tiien be 
understood and acted upon by the system 10. 
[0050] The known automatic speech recognition func- 
tions disclosed in ttie article by Zeppenfeld. eA al., enti- 
tied "Recognition of Conversational Telephone Speech 



Using The Janus Speech Engine," Proceedings of 
ICASSP 97. Vol. 3. pp. 1815 1997; and ttie known natu- 
ral language understanding functions d'^osed in ttie 
article by K. Shirai and S. Furui, entitied "Special issue 

5 on Spoken Dialog." 15, (3-4) Speech Communication, 
1994 may also be employed in tiie present invention. 
Furttier. to simplify the programming of ttie ASR/NLU 
module 24, the keyword spotting based recognition 
mettiods as disctosed in "Wbrd Spotting from Continu- 

10 ous Speech Utterances," Richard C. Cross. Automatic 
Speech and Speaker Recognition. Advanced Topics, 
pp. 303-327. ecfited by Chin-Hui Lee. Frank K. Soong. 
Kiddip K. Paiwal (Huwer Academic Publishers). 1996 
may preferably be used to guarantee ttiat certain critical 

IS messages are sufTidentty handled. 

[P051] It Is to be appreciated tiiat by utilizing natural 
language understanding, as demonstrated above, ttie 
system 10 is capable of performing interactive votoe 
response (IVR) functions so as to establish a dialog witti 

20 ttie user or caller to provide cfialog management and 
request understanding. This enables ttie system 10 to 
be utilized for order taking and dialog-based form filing. 
Further, such functior^ allow ttie caller to decide how to 
process the call (assuming the system 10 is pro- 

25 grammed accordingly), i.a. by leaving an e-mall or 
voice mail message, sending a page or transfening ttie 
call to another telephone number. In addition, to be 
explained below, this allows ttie system 10 to be 
remotely programmed by the user ttirough voice com- 

30 mands. . 

[0052] It is to be furttier appreciated that ttie systern 
1 0 provides security against unauttiorized access to ttie 
system 10. Partibulariy. in order for a vser to have 
access to and participate in the system 10, tiie user 

ss must go tiirough the system*^ enrollment process. This 
process may be effected in various ways. For instance, 
enrollment may be performed remotely by having a new 
user call and enter a previously issued personal identifi- 
cation number (Plfsl). whereby ttie server 20 can be pro- 

40 grammed to respond to ttie PIN which \s input into ttie 
system 10 via DTMF Kieys on the new user's telephone. 
The system 10 can then build voice models of ttie new 
user to verify arxj identify the new user when he or she 
attenpts to access or program ttie system 1 0 at a sut>- 

45 sequent tim& Alternatively, eitiier a recorded a live tel- 
ephone conversation of tiie new user may be utilized to 
buiU the requisite speaker models for future identifica- 
tion and verification. 

[0053] It is to be appreciated ttiat ttie server 20 of ttie 
so present invention may be structured in accordance wHh 
ttie teachings of patent application (IBM Docket Number 
Y0997-313) entitied 'Apparatus and Mettiods For Pro- 
viding Repetitive Enrollment in a Plurality of Biometric 
Recognition Systems Based on an Initial Enrollment.' 
55 tiie disclosure of which is incorporated by reference 
herein, so as to make ttie speaker models (i.e.. biomet- 
ric data) of auttiorized users (whk:h are stored in ttie 
server 20) available to ottier biometric recognition 
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based systems to automaticaily enroll the user without 
the user having to systematically provide new biometric 
models to enroll in such systems. . 
[0054] The process of programnnng the system 10 
can be performed by a user either locally, via a GUI s 
Interface or voice commands, or remotely, over a tele- 
phone fine (voice commands) or through a network sys- 
tem connected to the system. In either event, this is 
accomplished through the programnrting interface 38. 
As demonstrated above, programming the system 10 is 
achieved by. eg., selecting the names of persons who 
should be trarisfen'ed to a certain number, voice mail or 
answering machine, k>y inputting certain keywords or 
topics to be recognized by the system 10 as requiring 
certain processing procedures and/or by p r o gra mm ing 
the system 10 to immediately connect emergency calte 
or business calls between the hours of 8:00 a.m. and 
12:00 p.m. As shown in Rg. 2, the programming inter- 
face 38 sends such information to the server 20. 
speaker recogriizer module 22. ASR/NLU module 26, 
language identifier/translator module 24, aucfio 
indexer/k^rioritizer module 34 and the switching module 
28. which directs the system 10 to process calls in 
accordance with the user's programmed instructions. 
[0055] The programming interface is responsive to 
eitiier DTMF key signal or voice commands by an 
autiiorized user. The preferred method of programming 
the system 10 is through voice activated commands via 
a process of speech recognition and natural language 
understanding, as opposed to DTMF keying or via GUI 
interface. This process allows the system 10 to verify 
and identify the user befbre the user is provkied access 
to the system 10. This provides security again^ unau- 
thorized users who may have knowledge of an other- 
wise valkJ PIN. Specifically. k>efore ttie user can program 
the system 10 through voice commands, the user's 
vofoe is first received by server 20. and then klentif ied 
and verified by the speaker recognizer module 22. Once 
the user's klentifk^ation is verified, the server 20 will sig- 
nal the programming interface 38 to allow the user to 
proceed with programming the system 10. 
[0056] The vdce commands for programming the sys- 
tem 10 are processed in the ASR/NLU module 24. Par- 
ticularly, during such programming, the ASR/NLU 
module 24 is in a command and control mode, whereby 
every voice instruction or command received by the pro- 
gramming interface 38 is s^ to the ASR/NLU module 
24, converted into symbolic language and interpreted 
as a command. For instance, if ttie user wants the sys- 
tem 10 to direct all calls from hte wife to his telephone 
line, ttie user may state, ag., "Immediately connect all 
calls from my wife Jane.' and the system 10 will recog- 
nize and process such programrrang command accord- 
ingly. 

[0057] Moreover, ttie user can establish a dialog witti 
ttie system 1 0 ttirough the ASR/NLU module 24 and ttie 
speech synttiesizer module 35. The user can check the 
cun-ent program bf asking the p r o gra mm ing interface 



38. e.g.. "What calls are transfen-ed to my answering 
machine." This query is then sent from ttie server 20 pf 
ttie user is calling into ttie system 10 from an outskie 
line), or from ttie programming interface 28 via the 
server 20 {jH ttie user is in ttie office), to ttie ASR/NLU 
module 24. wherein the query is processed. The 
ASR/NLU 24 module will then generate a reply to ttie 
query, which is sent to the speech synthesizer 36 to 
generate a synttiesized message, e.g.. "All personal 
calls are directed to your answering machine," whfoh is 
ttien played to the user. 

. [0058] Similariy. if the system 10 is unable to under- 
stand a vert>al programming request from an auttiorized 
user, the ASR/NLU module 24 can generate a prompt 
for ttie user. e.g.. "Please rephrase your request." and 
processed by the speech synthesizer 36. Specifically, 
during such programming, the sender 20 sends a pro- 
gramming request to ttie programming interface 38. If 
ttie system 10 is unable to dedpher ttie request ttie 
programming interface 38 sends a faihjre message 
back to the server 20, which relays this message to ttie 
ASR/NLU module 24. The ASR/NLU module 24 may 
then either reprocess the query for a potential different 
meaning, or It can prompt ttie user (via ttie speech syn- 
ttiesizer 36) to issue a new programming request. 
[0059] It e to be appreciated ttiat the system 10 may 
be programmed to manage various messages and calls 
received via voice-mails, telephone lines, facsim- 
ile/modem, e-mail and ottier telecommunication devices 
which are connected to the system 10 through the oper- 
ation of the audio indexar4)rioritizer module 34. In par- 
ticular, ttie audio indexer^oritizer module 34 may be 
programmed to automaticaily sort and index such mes- 
sages and telephone conversations according their 
subject matter and content, origin, or both, the system 
1 0 can preferably be furttier programnied so as to prior- 
itize certain calls and messages from a specif ic indivkl- 
ual. 

[0060] Referring to Fig. 2, the audio indexing feature 
of the system 10 works as follows. Once ttie caller is 
kientified and verified by the speaker recognizer module 
22, ttie speaker recognizer module 22 signals the ID 
tagger nxxiule 30 which automatically tags the identity 
of the caller or the identity of current speaker of a groi^ 
of participants to a teleconference. Simultaneously witti 
ttie ID tagging process, the transcriber module 32 tran- 
scrit)es the telephone conversation or message. The 
tagging process involves associating ttie transcribed 
message with the dentity of the caller or speaker. For 
instarKe, during teleconferences, each segment of the 
transaibed conversation corresponding to the current 
speaker is tagged witti ttie kientity of such speaker 
togettier witti the begin time and end time for each such 
segment 

[0061] The information processed in the ID tagger 
module 30 and ttie transcriber module 32 is sent to ttie 
audio indexer/^oritizer module 34, wherein the 
received information is processed and stored according 
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to a pre-programmed procedura The audio indexer/j^ri- 
oritizer module 34 can be programmed to index the 
messages and conversations in any manner that the 
user desires. For instance, the user m^ be able to 
either retrieve the messages from a certain caller, 
retrieve all urgent messages, or retrieve the messages 
that relate to a specific matter. Further, the audio 
indexer4>rioritizer module 34 can l:>e programmed to pri- 
oritize calls from a caller who has either left numerous 
messages or has left urgent messages. 
[0062] The infomiation stored in the audio indexer/pri- 
oritizer module 36 can then be accessed and retrieved 
by the user either locally or remotely. When such infor- 
mation is accessed the user, the audio indexer^or- 
itizer nxxlule 36 send the requested information to the 
speech synthesizer module 38. wherein a text-to- 
speech conversion is performed to allow the user to 
hear the message in the fomn of synthesized speech. It 
is to be understood that any conventional speech syn- 
thesizing technique m^ be utilized In the present inven- 
tion such as the Eoquent engine provided with the 
commercially available IBM VIAVOICEGOLD software 
[0063] It is to be appreciated that infanoation may be 
retrieved from the audio indexer/|3rioritizer nmdule 34 
through various methods such as via GUI interface. 
PINs and DTMF keying. The preferred method in the 
present invention for retrieving such information, how- 
ever, is through voice activated commands. Such 
method allows the system 10 to identify and verify the 
user before providing access to the messages or con- 
versations stored and indexed in the audio indexery|pri- 
oritizer nxxlule 34. The audio indexer4)rioritizer module 
34 can be programmed to recognize and respond to 
certain voice commands of the user, which are proc- 
essed by the A8R/NLU nriodule 24 and sent to the audio 
lndexer4)rioritizer module 34, in order to retrieve certain 
messages and conversations. For example, the user 
may retrieve all the messages from Mr. Srriith that are 
stored in the audio Indexer/jfirioritizer module 36 through 
a voice command. e.g., Tlay all messages frorn Mr. 
Smith." This command is received by the server 20 and 
sent to the ASR/NLU module 24 for processing. If the 
ASR/NLU nxxJuIe 24 understands the query, the 
ASR/NLU MODULE 24 sends a reply bade to the sender 
20 to process the query. The server 20 then signals the 
indexer4>rioritizer module 34 to send the . requested 
messages to the speech synthesizer to generate syn- 
thesized e-mail or facsinule messages, or directly to the 
server 20 for recorded telephone or voice mail mes- 
sages, which are simply played bade 
[0064] It is to be appreciated that various aftemative 
programming strategies to process calls may be 
employed in the present invention by one of ordinary 
skin in the art For instance, the system 10 may be pro- 
grammed to warn the user in the event of an important 
or urgent Incoming telephone call. SpedTically. the sys- 
tem 10 can be programmed to notify the user on a dis- 
play thereby allowing the user to make his own dedsion 



on how to handle such call, or to simply process the call, 
as demonstrated above, in accordance with a pre-pro- 
grammed procedure. Moreover, the system 10 can be 
programmed to fdnvard an urgent or important call to 

5 the user's beeper when the user is not home or is out of 
the offica The user may also program the system 10 to 
dial a sequence of telephone numbers (alter answering 
an incoming telephone call) at certain locations where 
the user may be found during the course of the day. Fur- 

70 thermore. the sequence O-e.. Gst) of pre-programmed 
telephone numbers may be automatically updated by 
the system 10 in accordance with the latest known toca- 
tion where the user is found. If the user desires, such list 
may also accessible k>y individuals who call into the sys- 

15 tem 10 so that such callers can attempt to contact the 
user at one of the various locations at their conven- 
ience. 

[0065]. In addition, it is to t>e appreciated that the sys- 
tem 10 may be programmed to store the names of ail 

20 persons who call the system 10. together with their tele- 
phone numt>ers (using ANI). as well as e-mail 
addresses of persons who send elecfronic mail. This 
allows the user of the system 10 to automatically reply 
to pending calls or messages without having to first 

25 determine the telephone number or e-mail addresses of 
the person to whom the user is replying. Further, such 
programming provides for dynamically creating a con- 
tinuously up-to-date address book wliich is accessible 
to an authorized user to send messages or make calls. 

30 Specifically, the user can access the system 10. select 
the name of a partk:ular person to call, and then com- 
mand the system 10 to send that person a certain mes- 
sage (e.g., e-mail or facsimile). 
[0066] Furthermore, the system 10 may be pro- 

3s grammed to allow the callens to access and utilize spe- 
cific functions of the system 10. For instance, the 
system 10 may offer the caller the option to schedule a 
tentative appointment with the user, whk^h may then be 
stored in the system 10 and then suk)sequently 

40 accepted or rq'ected by the user. The caller may also be 
afforded the opportunity to chose the method by whk:h 
the user may confirm, reject or adjourn such appoint- 
ment (e.g.. telephone call, facsimile or e-mail). Addition- 
ally, the system 10 may be programmed to provide 

4S certain authorized caller with access to the user% 
appointment calendar so that such appointments may 
be easily scheduled. 

[0067] It is to be further appredated that the present 
invention may k>e employed in a small scale application 

so for personal home use, or employed in a large scale 
office or corporate applicatfons. It is to be further appro- 
dated b)f one of ordinary skill in the art that the system 
1 0 may be utilized in other applications. For instance. t>y 
utilizing the NLU feature of the system 10. the system 

55 10 may be connected to devices such as tape record- 
ers, radios and televisions so as to warn the user when- 
ever a certain topic is being covered on some channel 
or if a particular person is being interviewed. It is to be 
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understood that the system 10 is not limited to tele- 
phone comnnunication& It is possa)le to use the system 
10 for web phones, net conversations, teleconferences 
and other various voice communications which involve 
the transmteston of voice through a digital or analog 5 
channel. Ackiitional electronic information such as 
ASCII characters, facsimile messages and the content 
of w^ pages and database searches can also be proc- 
essed In the same nianner. For exarnple^ adding opti- 
cal character recognition {OCR) with facsimile receiving 70 
capalxlities, the system 10 is able to transcrS>e ttie con- 
tent of messages received by facsimile or e-mafl to be 
stored in tiie audio indexer/^oritizer 34. As demon- 
strated above, the user may then retrieve these mes- 
sages, through the speech syrrthesizer 36 to hear the 15 
content of such messagesw 

[0068] In sum. the present invention provides a pro- 
grammable call and message processing system which 
can be programmed t>y a user to process incoming tel- 
ephone calls, e-mails messages, facsimile messages 20 
and other electronic infonmation data in a predeter- 
niined manner v^thout the user having to first manually 
answer a telephone call or retrieve an e-mail or facsimile 
rhessage, identify the caller or the author of the mes- 
sage, and then decide how to transfer such call or 2s 
respond to such messaga The present invention can t>e 
programmed to transcribe telephone conversations or 
teleconferences, tag the identity of the caller or partici- 
pants to the teleconference, and store such messages 
and conversations according to the identity off the caller 30 
or author and/or the subject matter and content of the 
call or message. The user may tiien retrieve any stored 
message or conversation based on tiie identity of the 
caller or a group of related messages based on tiieir 
subject matter. 35 
Further features of the invention may be as follows: 
[0069] The sender means further receives, and is 
responsive to. one of an incoming facsimile message, e- 
nnail message, voice data, data convertible to text and a 
connibination thereof , 40 

[0070] The speaker recognition means is based on 
textnndependent speaker recognition. 
[0071 ] The speech recognition means utilizes speech 
recognition and natural language understanding to 
deterrrtine said subject matter and corrtent of said call. 45 
[0072] The system includes language identification 
means, operatively coupled to said speech recognition 
means, for identifying and understanding languages off 
said incoming call. 

[0073] The identffication means performs language so 
trar^ation. 

[0074] The identity of said caller is detenmined from 
said identified language of said call. 
[0075] The language identification means uses iden- 
tity of said caller to identify language of said call. ss 
[0076] Enrollment means and includes for enrolling a 
new user to have access to said system. 
[0077] The new user may be self-enrolled. 



[0078] Means are provided for determining a time of 
said call and wherein said system may be further pro- 
grammed to process sakJ call in accordance with said 
time of said c»ll. 

[0079] The programming means includes one of a 
QUI interface, a voice interface, a programming oonTig- 
uration file, and a combination thereof 
[0080] The prograiivmig may be peribrnted one off 
locally, remote^ and a combination tiiereof. 
[0081 ] Means are provided, responsive to said incom- 
ing call, for dynamically creating an address book. 
[0082] Means are provided for accessing said address 
book to send a message to a selected persoa 
[0083] Processing of said call includes transferring an 
incoming telephone call to a plurality of different tele- 
phone numbers one of sequentially and simultaneously. 
[D084] Means are provided for prompting the caller to 
identify hinVherselff and the subject matter of said call. 
Said prompting is perfonned when said system cannot 
detemvne eittier said identity or said si^ject matter of 
call. 

[Q085] Alternately said prompting is performed when 
said call is received to determine said Mentity of said 
caller and subject matter of said call. 
[0086] May further comprise means, operatively con- 
nected to said transcribing means, for drctating mes- 
sages from a user of said system and sending said 
message to a selected person. The message may be 
sent by one of a focsimile, e-mail or telephone call, and 
a combination thereof, to said selected person. 
[0087] May ffurttier comprise means for adding mood 
stamps or urgency/conf identiality stamps in a header in 
one of said facsimile and e-mail. 
[P088] The step of determining said identity off sakJ 
caller may be performed by text-independent speaker 
recognition. 

[0089] The step off deternvning said subject matter off 
said call may be performed by speech recognition and 
natural language understanding. 
[0090] The metixxl may include the step off translating 
said can into a language other than tiiat of saki call. 
[0091 ] The incoming call may be recorded. 
[0092] Recording is performed simultaneously with 
sakJ step of determining klentity of said caller and m^ 
be performed prior to sakJ step detenmining identity off 
said caller. 

[0093] May further comprising the steps of: determin- 
ing a time of said call; and processing saki call based on 
saki determined time off said call. 
[0094] The the step off retrieving saki indexed informa- 
tion Is performed by voice commands. 
[0095] The method may include determining the time 
of one of said call and message; and processing one of 
said call and message in accordance with said deter- 
mined time 
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Claims 

1. An automatic caU and data tiansfer processing sys- 
tem. coTT^teing: server means (20) for receiving an 
incoming call; characterised by s. 

speaker recognition means (22). operatively 
C0Lf>ied to said server means, for identifying 
caller of said cali; 

speech recognHfon means (24). operatively io 
coupled to said server means, for determining 
subject matter and content of said call; 
switching means (28). responsive to said 
spealcar recognition means and speech recog- 
nition means, for processing said call in accord- IS 
ance with one of said identificalion of said 
caller and determined subject matter; and 
programming means (38), operatively coupled 
to said server means, said spealcer recognition 
means, said speech recognition mearis and 20 
said switching rrieans for programming system 
to perform said processing. 

2. A system of claim 1 , characterised in that tiie server 
means includes means for recording (40) said 2s 
incoming call. 

3. A system of daim 2. characterised in that said 
server means further includes means (42) for com- 
pressing and storing said recorded data and means 30 

. fbr decompressing said compressed data. 

4. Asystemof claim 1,2 or 3 further characterised by 
identificalion tagging means (30), responsive to 
said speaker recognition means, for automatically 3s 
tagging said identity of said caller; transcribing 
means (32), responsive to said speech recognition 
means, for transcribmg a telephone conversation or 
message of said caller; and audfo indexing means 
(34), operatively coupled to saki Mentification tag- 40 
ging means and sakJ transaibing means, fbr index- 
ing sakI messages and said conversations of said 
caller according to suk)ject matter of said conversa- 
tion and sakJ message and the tientity of said 
caller. 4S 

5. A system of daim 4 further characterised by means 
for retrieving (118) sakj indexed messages from 
sakJ audk) indexing means. 

so 

6. A system of daim 2. 4 or 5, furttier characterised by 
speech synthesizer means (36) operatively coupled 
to saki server means, saki speech recognition 
means and saki audio indexing means, fbr convert- 
ing information stored in saki audfo indexing means ss 
into synthesized speech. 

7. A mettiod Ibr provicfing automatic call or message 



data processing, characterised by detennimng the 
kientity of sakJ caller (130) from an incoming call; 
determining ttie aibject matter of said call (170); 
processing (152. 154. 156. 158) safo call in accord- 
ance with one of saki Mentity of sakJ caller and sub- 
ject matter of said caD. 

8. A metiiod for provkilng automatic rail or message 
data processing, comprising the st^ of: receiving 
one of an incoming call and message data (100); 
kientifying a caller of saki call if an incoming call Is 
received (130) and detemrdrung sut^ect matter of 
saki can (160); klentifying an author of saki mes- 
sage if message data is received and determining 
subject matter of saki message; processing (152, 
154, 156, 158) one of saki call and message in 
accordance witti one of saki kientity of said caller 
and aiittK)r and saki sut^ect rnatter of saki can and 
message. 

9. The method further characterised by tiie steps of: 
tagging saki determined kientity of one of saki 
caller and said auttior; transcribing saki determined 
subject matter of one of said call and saki message; 
indexing the information resulting from saki tagging 
and said transcribing in accordance with one of saki 
determined subject matter, saki determined identity 
and a combination thereof. 

ia A mettiod niay of claim 9 characterised by retrienng 
saki indexed information and converting saki 
indexed information into synthesized speech. 
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but is not limited to, switching the call to anotiier system, 
fonvarding the call to anottier telephone terminal, plac- 
ing the call on hold, or disconnecting the call. In another 
aspect of tiie present invention, tiie system may be env 
ployed to process information retrieved from otiier tele- 
communication devices such as voice mail, facsimile/ 
modem or e-mail. The system is capable of tagging the 
identity of a caller or participants to a teleconference, 
and transcribing the teleconferences, phone conversa- 
tions and messages of such callers and participants. 
The system can automatically index or prioritize the re- 
ceived calls, messages, e-mails and facsimiles accord- 
ing to tiie caller kJentificatton or subject matter of the 
conversation or message, and allow the user to retrieve 
messages that either originated from a specific source 
or caller or retrieve calls which deal with similar or spe- 
cific subject matter. 
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